Overview

Dataset statistics

Number of variables11
Number of observations1106
Missing cells0
Missing cells (%)0.0%
Duplicate rows110
Duplicate rows (%)9.9%
Total size in memory95.2 KiB
Average record size in memory88.1 B

Variable types

NUM10
BOOL1

Warnings

Dataset has 110 (9.9%) duplicate rows Duplicates

Reproduction

Analysis started2020-09-19 14:42:22.539484
Analysis finished2020-09-19 14:42:58.514735
Duration35.98 seconds
Software versionpandas-profiling v2.9.0
Download configurationconfig.yaml

Variables

freight_value
Real number (ℝ≥0)

Distinct470
Distinct (%)42.5%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean17.20873418
Minimum0.03
Maximum171.88
Zeros0
Zeros (%)0.0%
Memory size8.8 KiB

Quantile statistics

Minimum0.03
5-th percentile7.78
Q112.55
median15.1
Q317.6
95-th percentile36.5675
Maximum171.88
Range171.85
Interquartile range (IQR)5.05

Descriptive statistics

Standard deviation13.32592201
Coefficient of variation (CV)0.7743696816
Kurtosis35.77575219
Mean17.20873418
Median Absolute Deviation (MAD)2.55
Skewness5.114416
Sum19032.86
Variance177.5801973
MonotocityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%) 
7.78373.3%
 
15.1312.8%
 
14.1282.5%
 
15.23252.3%
 
11.85252.3%
 
12.55232.1%
 
15.8211.9%
 
9.34161.4%
 
13.62141.3%
 
7.39121.1%
 
8.27111.0%
 
16.6890.8%
 
12.6690.8%
 
13.3790.8%
 
18.2390.8%
 
12.6180.7%
 
15.2780.7%
 
8.7270.6%
 
15.6570.6%
 
10.9670.6%
 
15.9270.6%
 
12.6260.5%
 
15.7960.5%
 
11.1560.5%
 
12.9960.5%
 
Other values (445)75968.6%
 
ValueCountFrequency (%) 
0.0330.3%
 
0.4810.1%
 
7.39121.1%
 
7.420.2%
 
7.4160.5%
 
7.4260.5%
 
7.4320.2%
 
7.4520.2%
 
7.4820.2%
 
7.4920.2%
 
ValueCountFrequency (%) 
171.8810.1%
 
121.2220.2%
 
106.1110.1%
 
104.7740.4%
 
100.7510.1%
 
86.0620.2%
 
83.2510.1%
 
80.4310.1%
 
79.6310.1%
 
77.9810.1%
 

review_score
Real number (ℝ≥0)

Distinct5
Distinct (%)0.5%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean2.722423146
Minimum1
Maximum5
Zeros0
Zeros (%)0.0%
Memory size8.8 KiB

Quantile statistics

Minimum1
5-th percentile1
Q11
median2
Q35
95-th percentile5
Maximum5
Range4
Interquartile range (IQR)4

Descriptive statistics

Standard deviation1.763322631
Coefficient of variation (CV)0.6477033644
Kurtosis-1.732711641
Mean2.722423146
Median Absolute Deviation (MAD)1
Skewness0.2427037343
Sum3011
Variance3.109306702
MonotocityNot monotonic
Histogram with fixed size bins (bins=5)
ValueCountFrequency (%) 
150946.0%
 
532429.3%
 
412611.4%
 
3847.6%
 
2635.7%
 
ValueCountFrequency (%) 
150946.0%
 
2635.7%
 
3847.6%
 
412611.4%
 
532429.3%
 
ValueCountFrequency (%) 
532429.3%
 
412611.4%
 
3847.6%
 
2635.7%
 
150946.0%
 

product_photos_qty
Real number (ℝ≥0)

Distinct9
Distinct (%)0.8%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean2.12477396
Minimum1
Maximum9
Zeros0
Zeros (%)0.0%
Memory size8.8 KiB

Quantile statistics

Minimum1
5-th percentile1
Q11
median1
Q33
95-th percentile5
Maximum9
Range8
Interquartile range (IQR)2

Descriptive statistics

Standard deviation1.542396676
Coefficient of variation (CV)0.7259109461
Kurtosis1.62843218
Mean2.12477396
Median Absolute Deviation (MAD)0
Skewness1.43005303
Sum2350
Variance2.378987505
MonotocityNot monotonic
Histogram with fixed size bins (bins=9)
ValueCountFrequency (%) 
158552.9%
 
219417.5%
 
414813.4%
 
3928.3%
 
5393.5%
 
6322.9%
 
780.7%
 
860.5%
 
920.2%
 
ValueCountFrequency (%) 
158552.9%
 
219417.5%
 
3928.3%
 
414813.4%
 
5393.5%
 
6322.9%
 
780.7%
 
860.5%
 
920.2%
 
ValueCountFrequency (%) 
920.2%
 
860.5%
 
780.7%
 
6322.9%
 
5393.5%
 
414813.4%
 
3928.3%
 
219417.5%
 
158552.9%
 

product_weight_g
Real number (ℝ≥0)

Distinct189
Distinct (%)17.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1176.650995
Minimum50
Maximum30000
Zeros0
Zeros (%)0.0%
Memory size8.8 KiB

Quantile statistics

Minimum50
5-th percentile100
Q1180
median250
Q3700
95-th percentile5937.5
Maximum30000
Range29950
Interquartile range (IQR)520

Descriptive statistics

Standard deviation2835.931706
Coefficient of variation (CV)2.410172361
Kurtosis30.17694717
Mean1176.650995
Median Absolute Deviation (MAD)100
Skewness4.878108163
Sum1301376
Variance8042508.638
MonotocityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%) 
25021919.8%
 
20011410.3%
 
150736.6%
 
167383.4%
 
173343.1%
 
175292.6%
 
900292.6%
 
500292.6%
 
100282.5%
 
400282.5%
 
180272.4%
 
50242.2%
 
600181.6%
 
1000171.5%
 
300161.4%
 
1025141.3%
 
156141.3%
 
183121.1%
 
700100.9%
 
550100.9%
 
159090.8%
 
80090.8%
 
45080.7%
 
35070.6%
 
31570.6%
 
Other values (164)28325.6%
 
ValueCountFrequency (%) 
50242.2%
 
6010.1%
 
6530.3%
 
7510.1%
 
8520.2%
 
9030.3%
 
100282.5%
 
11010.1%
 
11510.1%
 
12510.1%
 
ValueCountFrequency (%) 
3000010.1%
 
2850010.1%
 
2280010.1%
 
2085010.1%
 
1855010.1%
 
1651510.1%
 
1600020.2%
 
1590010.1%
 
1555020.2%
 
1481310.1%
 

product_length_cm
Real number (ℝ≥0)

Distinct54
Distinct (%)4.9%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean24.03616637
Minimum14
Maximum105
Zeros0
Zeros (%)0.0%
Memory size8.8 KiB

Quantile statistics

Minimum14
5-th percentile16
Q116
median18
Q328
95-th percentile50
Maximum105
Range91
Interquartile range (IQR)12

Descriptive statistics

Standard deviation13.32281252
Coefficient of variation (CV)0.5542819231
Kurtosis9.452092501
Mean24.03616637
Median Absolute Deviation (MAD)2
Skewness2.703695541
Sum26584
Variance177.4973333
MonotocityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%) 
1636833.3%
 
1714112.7%
 
18928.3%
 
20837.5%
 
30484.3%
 
40454.1%
 
19282.5%
 
21272.4%
 
22222.0%
 
25161.4%
 
38151.4%
 
23151.4%
 
35151.4%
 
36131.2%
 
24131.2%
 
50111.0%
 
31100.9%
 
2890.8%
 
6090.8%
 
4590.8%
 
3390.8%
 
2680.7%
 
2980.7%
 
3970.6%
 
3770.6%
 
Other values (29)787.1%
 
ValueCountFrequency (%) 
1460.5%
 
1510.1%
 
1636833.3%
 
1714112.7%
 
18928.3%
 
19282.5%
 
20837.5%
 
21272.4%
 
22222.0%
 
23151.4%
 
ValueCountFrequency (%) 
10540.4%
 
10030.3%
 
8220.2%
 
8020.2%
 
7810.1%
 
7720.2%
 
7510.1%
 
7010.1%
 
6910.1%
 
6840.4%
 

product_height_cm
Real number (ℝ≥0)

Distinct55
Distinct (%)5.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean13.35985533
Minimum2
Maximum105
Zeros0
Zeros (%)0.0%
Memory size8.8 KiB

Quantile statistics

Minimum2
5-th percentile2
Q15
median11
Q316
95-th percentile35
Maximum105
Range103
Interquartile range (IQR)11

Descriptive statistics

Standard deviation12.60972754
Coefficient of variation (CV)0.9438521018
Kurtosis14.66356462
Mean13.35985533
Median Absolute Deviation (MAD)6
Skewness3.02016623
Sum14776
Variance159.0052286
MonotocityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%) 
223020.8%
 
111039.3%
 
131009.0%
 
10777.0%
 
20666.0%
 
12565.1%
 
16484.3%
 
18363.3%
 
8353.2%
 
6282.5%
 
5272.4%
 
4252.3%
 
14252.3%
 
9242.2%
 
15222.0%
 
7191.7%
 
25171.5%
 
17171.5%
 
30151.4%
 
35121.1%
 
3121.1%
 
19121.1%
 
24111.0%
 
40111.0%
 
4760.5%
 
Other values (30)726.5%
 
ValueCountFrequency (%) 
223020.8%
 
3121.1%
 
4252.3%
 
5272.4%
 
6282.5%
 
7191.7%
 
8353.2%
 
9242.2%
 
10777.0%
 
111039.3%
 
ValueCountFrequency (%) 
10540.4%
 
9610.1%
 
8710.1%
 
8010.1%
 
7710.1%
 
6620.2%
 
6510.1%
 
6310.1%
 
6110.1%
 
6030.3%
 

product_width_cm
Real number (ℝ≥0)

Distinct53
Distinct (%)4.8%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean19.47377939
Minimum11
Maximum105
Zeros0
Zeros (%)0.0%
Memory size8.8 KiB

Quantile statistics

Minimum11
5-th percentile11
Q113
median16
Q320
95-th percentile40
Maximum105
Range94
Interquartile range (IQR)7

Descriptive statistics

Standard deviation10.70677629
Coefficient of variation (CV)0.5498047439
Kurtosis11.12146434
Mean19.47377939
Median Absolute Deviation (MAD)4
Skewness2.728459791
Sum21538
Variance114.6350585
MonotocityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%) 
2018216.5%
 
1117415.7%
 
1512911.7%
 
12898.0%
 
13877.9%
 
14686.1%
 
16524.7%
 
40433.9%
 
18383.4%
 
25312.8%
 
17292.6%
 
30282.5%
 
23181.6%
 
35141.3%
 
32121.1%
 
24111.0%
 
3180.7%
 
3370.6%
 
3660.5%
 
2260.5%
 
1960.5%
 
4560.5%
 
2160.5%
 
2650.5%
 
2940.4%
 
Other values (28)474.2%
 
ValueCountFrequency (%) 
1117415.7%
 
12898.0%
 
13877.9%
 
14686.1%
 
1512911.7%
 
16524.7%
 
17292.6%
 
18383.4%
 
1960.5%
 
2018216.5%
 
ValueCountFrequency (%) 
10510.1%
 
10010.1%
 
8210.1%
 
7910.1%
 
7310.1%
 
6810.1%
 
6720.2%
 
6320.2%
 
6220.2%
 
6110.1%
 

payment_value
Real number (ℝ≥0)

Distinct684
Distinct (%)61.8%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean180.947387
Minimum0.33
Maximum4809.44
Zeros0
Zeros (%)0.0%
Memory size8.8 KiB

Quantile statistics

Minimum0.33
5-th percentile30
Q168.19
median88.91
Q3162.55
95-th percentile536.0075
Maximum4809.44
Range4809.11
Interquartile range (IQR)94.36

Descriptive statistics

Standard deviation354.3498513
Coefficient of variation (CV)1.958303224
Kurtosis64.53082843
Mean180.947387
Median Absolute Deviation (MAD)35.4
Skewness6.868153021
Sum200127.81
Variance125563.8171
MonotocityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%) 
165.8201.8%
 
162.55181.6%
 
85.62141.3%
 
86.9100.9%
 
69.1480.7%
 
93.6570.6%
 
37.7560.5%
 
64.1760.5%
 
25860.5%
 
81.5160.5%
 
90.9960.5%
 
93.360.5%
 
85.4160.5%
 
140.1660.5%
 
1949.5260.5%
 
133.9860.5%
 
87.3260.5%
 
62.7860.5%
 
9.3960.5%
 
63.160.5%
 
144.350.5%
 
975.0550.5%
 
89.2850.5%
 
1950.250.5%
 
73.4950.5%
 
Other values (659)92083.2%
 
ValueCountFrequency (%) 
0.3310.1%
 
3.7710.1%
 
6.4210.1%
 
6.4810.1%
 
7.110.1%
 
7.3210.1%
 
8.2610.1%
 
8.7910.1%
 
9.3960.5%
 
9.4610.1%
 
ValueCountFrequency (%) 
4809.4420.2%
 
2442.8210.1%
 
2419.210.1%
 
2404.7220.2%
 
2026.5410.1%
 
1950.250.5%
 
1949.5260.5%
 
1841.7510.1%
 
1818.2310.1%
 
1488.1410.1%
 

category_count
Real number (ℝ≥0)

Distinct46
Distinct (%)4.2%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean6614.779385
Minimum39
Maximum11990
Zeros0
Zeros (%)0.0%
Memory size8.8 KiB

Quantile statistics

Minimum39
5-th percentile708.5
Q14726
median6213
Q38833
95-th percentile10030
Maximum11990
Range11951
Interquartile range (IQR)4107

Descriptive statistics

Standard deviation2753.253071
Coefficient of variation (CV)0.4162274977
Kurtosis-0.2518930364
Mean6614.779385
Median Absolute Deviation (MAD)1938
Skewness-0.5960062589
Sum7315946
Variance7580402.474
MonotocityNot monotonic
Histogram with fixed size bins (bins=46)
ValueCountFrequency (%) 
621322720.5%
 
815120018.1%
 
1003014413.0%
 
9005777.0%
 
7380615.5%
 
4726585.2%
 
8833524.7%
 
4281343.1%
 
4400322.9%
 
3589242.2%
 
3204201.8%
 
4590201.8%
 
11990181.6%
 
3999171.5%
 
2625141.3%
 
2847111.0%
 
719111.0%
 
1192100.9%
 
70580.7%
 
56570.6%
 
217060.5%
 
203050.5%
 
19950.5%
 
24740.4%
 
116340.4%
 
Other values (21)373.3%
 
ValueCountFrequency (%) 
3910.1%
 
7130.3%
 
14510.1%
 
15510.1%
 
19950.5%
 
21910.1%
 
24740.4%
 
27130.3%
 
27240.4%
 
27820.2%
 
ValueCountFrequency (%) 
11990181.6%
 
1003014413.0%
 
9005777.0%
 
8833524.7%
 
815120018.1%
 
7380615.5%
 
621322720.5%
 
4726585.2%
 
4590201.8%
 
4400322.9%
 

Days_to_deliver
Real number (ℝ≥0)

Distinct962
Distinct (%)87.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean22.89471047
Minimum2.22818287
Maximum144.8952431
Zeros0
Zeros (%)0.0%
Memory size8.8 KiB

Quantile statistics

Minimum2.22818287
5-th percentile10.53071181
Q117.98866609
median22.4016088
Q326.5552662
95-th percentile35.14960648
Maximum144.8952431
Range142.6670602
Interquartile range (IQR)8.566600116

Descriptive statistics

Standard deviation9.536865519
Coefficient of variation (CV)0.4165532266
Kurtosis32.11593651
Mean22.89471047
Median Absolute Deviation (MAD)4.328680556
Skewness3.423355991
Sum25321.54978
Variance90.95180392
MonotocityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%) 
19.4609143590.8%
 
22.401608870.6%
 
18.6686689860.5%
 
28.5224074160.5%
 
20.5893402860.5%
 
13.0843518560.5%
 
15.4181481550.5%
 
24.4139930650.5%
 
18.3880439850.5%
 
24.4528935250.5%
 
22.2024768550.5%
 
15.2812731550.5%
 
15.0165046350.5%
 
32.0563657450.5%
 
13.6027893540.4%
 
23.5635416740.4%
 
21.2596759340.4%
 
18.2192361130.3%
 
34.3257291730.3%
 
23.0868634330.3%
 
41.214641230.3%
 
26.3219444430.3%
 
5.37126157420.2%
 
18.0587847220.2%
 
29.0374768520.2%
 
Other values (937)99389.8%
 
ValueCountFrequency (%) 
2.2281828710.1%
 
2.25190972210.1%
 
2.54951388910.1%
 
3.36413194410.1%
 
4.05612268510.1%
 
4.30564814810.1%
 
4.70914351910.1%
 
4.96118055610.1%
 
5.37126157420.2%
 
5.5296296310.1%
 
ValueCountFrequency (%) 
144.895243110.1%
 
106.99233810.1%
 
78.0486458310.1%
 
70.3810763910.1%
 
68.4414930610.1%
 
63.5254513920.2%
 
60.6093055610.1%
 
60.5214351910.1%
 
59.3472685210.1%
 
57.4698958310.1%
 

TargetVar
Boolean

Distinct2
Distinct (%)0.2%
Missing0
Missing (%)0.0%
Memory size8.8 KiB
1
553 
0
553 
ValueCountFrequency (%) 
155350.0%
 
055350.0%
 

Interactions

Correlations

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.

Phik (φk)

Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.

Missing values

Sample

First rows

freight_valuereview_scoreproduct_photos_qtyproduct_weight_gproduct_length_cmproduct_height_cmproduct_width_cmpayment_valuecategory_countDays_to_deliverTargetVar
021.1051.01383.050.010.040.0108.001199023.3896410
115.3811.0200.016.012.011.055.28738013.3964350
29.9811.0173.018.013.012.090.9881519.4190510
315.3811.0173.018.013.012.0105.17815127.4053820
48.3311.0321.019.014.013.093.22815123.0399420
58.3311.0321.019.014.013.0186.44815122.5113190
68.3311.0321.019.014.013.0186.44815122.5113190
717.6012.01750.037.022.040.059.24459031.6184370
817.6012.01750.037.022.040.08.26459031.6184370
940.3711.06550.020.020.020.0189.37815130.3753700

Last rows

freight_valuereview_scoreproduct_photos_qtyproduct_weight_gproduct_length_cmproduct_height_cmproduct_width_cmpayment_valuecategory_countDays_to_deliverTargetVar
109615.2352.0175.019.011.014.08.79472622.4016091
10977.4231.0150.016.016.016.087.32815118.1755671
109815.3114.0188.017.06.012.045.30472614.4950001
109915.1032.0250.016.02.011.097.20621326.3191201
110018.2351.0183.018.03.011.039.22472626.2206941
110115.1051.0150.016.016.011.041.00900521.3784611
110215.3451.0180.017.010.013.099.34815128.3294911
110317.0351.0167.016.011.015.0102.03815130.6290971
110413.6141.0150.016.014.011.083.51815129.5423381
110522.0654.0250.016.02.020.066.06621330.0631601

Duplicate rows

Most frequent

freight_valuereview_scoreproduct_photos_qtyproduct_weight_gproduct_length_cmproduct_height_cmproduct_width_cmpayment_valuecategory_countDays_to_deliverTargetVarcount
129.3411.0900.040.05.040.0133.98883318.66866906
2313.3721.0900.040.08.040.0140.16883328.52240716
3215.2352.0175.019.011.014.09.39472622.40160916
3615.9211.0315.014.013.013.01949.52459020.58934006
17.3911.050.020.05.015.086.90320415.41814805
57.7811.050.020.020.018.0211.40815115.28127305
1510.9626.0400.016.012.012.0144.30738032.05636605
1711.8512.0100.028.017.011.086.90738022.20247705
2414.0812.0900.040.08.040.0120.35883318.38804415
2715.0111.0400.035.035.025.0975.05815124.41399305